Goto

Collaborating Authors

 virtual keyboard


Multimodal Appearance based Gaze-Controlled Virtual Keyboard with Synchronous Asynchronous Interaction for Low-Resource Settings

Meena, Yogesh Kumar, Salvi, Manish

arXiv.org Artificial Intelligence

Over the past decade, the demand for communication devices has increased among individuals with mobility and speech impairments. Eye-gaze tracking has emerged as a promising solution for hands-free communication; however, traditional appearance-based interfaces often face challenges such as accuracy issues, involuntary eye movements, and difficulties with extensive command sets. This work presents a multimodal appearance-based gaze-controlled virtual keyboard that utilises deep learning in conjunction with standard camera hardware, incorporating both synchronous and asynchronous modes for command selection. The virtual keyboard application supports menu-based selection with nine commands, enabling users to spell and type up to 56 English characters, including uppercase and lowercase letters, punctuation, and a delete function for corrections. The proposed system was evaluated with twenty able-bodied participants who completed specially designed typing tasks using three input modalities: (i) a mouse, (ii) an eye-tracker, and (iii) an unmodified webcam. Typing performance was measured in terms of speed and information transfer rate (ITR) at both command and letter levels. Average typing speeds were 18.3+-5.31 letters/min (mouse), 12.60+-2.99letters/min (eye-tracker, synchronous), 10.94 +- 1.89 letters/min (webcam, synchronous), 11.15 +- 2.90 letters/min (eye-tracker, asynchronous), and 7.86 +- 1.69 letters/min (webcam, asynchronous). ITRs were approximately 80.29 +- 15.72 bits/min (command level) and 63.56 +- 11 bits/min (letter level) with webcam in synchronous mode. The system demonstrated good usability and low workload with webcam input, highlighting its user-centred design and promise as an accessible communication tool in low-resource settings.


Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training

Shen, Junxiao, Khaldi, Khadija, Zhou, Enmin, Surale, Hemant Bhaskar, Karlson, Amy

arXiv.org Artificial Intelligence

Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK^2, are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK^2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.


Apple Vision Pro's Eye Tracking Exposed What People Type

WIRED

You can tell a lot about someone from their eyes. They can indicate how tired you are, the type of mood you're in, and potentially provide clues about health problems. Today, a group of six computer scientists are revealing a new attack against Apple's Vision Pro mixed reality headset where exposed eye-tracking data allowed them to decipher what people entered on the device's virtual keyboard. The attack, dubbed GAZEploit and shared exclusively with WIRED, allowed the researchers to successfully reconstruct passwords, PINs, and messages people typed with their eyes. "Based on the direction of the eye movement, the hacker can determine which key the victim is now typing," says Hanqiu Wang, one of the leading researchers involved in the work.


EEG Right & Left Voluntary Hand Movement-based Virtual Brain-Computer Interfacing Keyboard with Machine Learning and a Hybrid Bi-Directional LSTM-GRU Model

Paneru, Biplov, Paneru, Bishwash, Sapkota, Sanjog Chhetri

arXiv.org Artificial Intelligence

This study focuses on EEG-based BMI for detecting voluntary keystrokes, aiming to develop a reliable brain-computer interface (BCI) to simulate and anticipate keystrokes, especially for individuals with motor impairments. The methodology includes extensive segmentation, event alignment, ERP plot analysis, and signal analysis. Different deep learning models are trained to classify EEG data into three categories -- `resting state' (0), `d' key press (1), and `l' key press (2). Real-time keypress simulation based on neural activity is enabled through integration with a tkinter-based graphical user interface. Feature engineering utilized ERP windows, and the SVC model achieved 90.42% accuracy in event classification. Additionally, deep learning models -- MLP (89% accuracy), Catboost (87.39% accuracy), KNN (72.59%), Gaussian Naive Bayes (79.21%), Logistic Regression (90.81% accuracy), and a novel Bi-Directional LSTM-GRU hybrid model (89% accuracy) -- were developed for BCI keyboard simulation. Finally, a GUI was created to predict and simulate keystrokes using the trained MLP model.


Bayesian inference on Brain-Computer Interface using the GLASS Model

Zhao, Bangyao, Huggins, Jane E., Kang, Jian

arXiv.org Machine Learning

The brain-computer interface (BCI) enables individuals with severe physical impairments to communicate with the world. BCIs offer computational neuroscience opportunities and challenges in converting real-time brain activities to computer commands and are typically framed as a classification problem. This article focuses on the P300 BCI that uses the event-related potential (ERP) BCI design, where the primary challenge is classifying target/non-target stimuli. We develop a novel Gaussian latent group model with sparse time-varying effects (GLASS) for making Bayesian inferences on the P300 BCI. GLASS adopts a multinomial regression framework that directly addresses the dataset imbalance in BCI applications. The prior specifications facilitate i) feature selection and noise reduction using soft-thresholding, ii) smoothing of the time-varying effects using global shrinkage, and iii) clustering of latent groups to alleviate high spatial correlations of EEG data. We develop an efficient gradient-based variational inference (GBVI) algorithm for posterior computation and provide a user-friendly Python module available at https://github.com/BangyaoZhao/GLASS. The application of GLASS identifies important EEG channels (PO8, Oz, PO7, Pz, C3) that align with existing literature. GLASS further reveals a group effect from channels in the parieto-occipital region (PO8, Oz, PO7), which is validated in cross-participant analysis.


Effective Gesture Based Framework for Capturing User Input

Charan, Pabbathi Sri, Gupta, Saksham, Agrawal, Satvik, Sindhu, Gadupudi Sahithi

arXiv.org Artificial Intelligence

Computers today aren't just confined to laptops and desktops. Mobile gadgets like mobile phones and laptops also make use of it. However, one input device that hasn't changed in the last 50 years is the QWERTY keyboard. Users of virtual keyboards can type on any surface as if it were a keyboard thanks to sensor technology and artificial intelligence. In this research, we use the idea of image processing to create an application for seeing a computer keyboard using a novel framework which can detect hand gestures with precise accuracy while also being sustainable and financially viable. A camera is used to capture keyboard images and finger movements which subsequently acts as a virtual keyboard. In addition, a visible virtual mouse that accepts finger coordinates as input is also described in this study. This system has a direct benefit of reducing peripheral cost, reducing electronics waste generated due to external devices and providing accessibility to people who cannot use the traditional keyboard and mouse.


Amazon Announces 'DeepComposer,' the World's First Machine-Learning USB Musical Keyboard for Developers of All Skill Levels - Grit Daily

#artificialintelligence

As consumers are going on their shopping frenzy for Cyber Monday, tech enthusiasts are spending their time in Las Vegas at the AWS re:Invent conference, where Amazon unveils the latest gadgets and gizmos it has in the works. Excitingly enough, for day one on Monday, Amazon Web Services (AWS) and Julien Simon announced the world's first machine-learning-enabled musical keyboard for developers--AWS DeepComposer. Powered by machine learning, AWS DeepComposer allows developers of all skill levels to learn "Generative A.I." while creating original music outputs. In the world of artificial intelligence (A.I.), the most rapidly growing areas include computer vision, natural language processing, and of course, Generative AI. Generative A.I. is one of the biggest recent advancements in A.I. technology because of its ability to create something new by utilizing "generative adversarial networks."


Dibakar Saha Talks About His Image Processing and Machine Learning Projects. - Cool Python Codes

#artificialintelligence

Do you know OpenCV, Machine Learning and Image Processing and you find it difficult to come up with cool amazing projects? Basically, he is a beginner in Python with experience in Image Processing and a little bit in machine learning. He has designed a very simple classification programs like spam detection and sentiment analysis using machine learning in Python. Using image processing he has also designed a very simple gesture recognition system. He has also designed a gesture-driven keyboard. And presently he is working on an app that he calls NFS Most Wanted 2013 Remote, that can control the cars in the game using your phone's accelerometer. He also revealed some tips that will help a lot of programmers out there, especially the newbies.


The top 10 PC technologies and trends to watch in 2017

PCWorld

Though some critics love to knock PCs as dinosaurs, laptops and desktops have gotten sexier, faster and even smarter. For every blue screen of death, there are droves of technological enhancements driving PCs into the era of virtual reality, 4K video and 5G connectivity. Here are the top 10 PC technology and trends to watch next year. An Intel employee demonstrates the company's Project Alloy headset on stage during IDF 2016 in San Francisco on August 16, 2016. VR devices will come in many new shapes and sizes, with some of them acting essentially as PCs that fit on your head.


Xbox One Software Update In Preview Fixes Messaging, Cortana, Virtual Keyboard And More

International Business Times

Microsoft has released a new software update for the Xbox One console, and it comes with fixes that improve the overall gaming experience. Unfortunately, this update is only available to preview members. According to DualShockers, the system software update has a build code of rs1_xbox_rel_1610.161103-1900. When it comes to Messaging, users who are unable to send messages will now see a dialog that contains information on why the message cannot be sent. The dialog will also direct users to Xbox Support.